A hybrid technique for estimating frequency moments over data streams
نویسنده
چکیده
The problem of estimating the k frequency moment Fk, for any non-negative integral value of k, over a data stream by looking at the items exactly once as they arrive, was considered in a seminal paper by Alon, Matias and Szegedy [1, 2]. They present a sampling based algorithm to estimate Fk where, k ≥ 2, using space Õ(n1−1/k)). Coppersmith and Kumar [7] and [10], using different methods, present algorithms for estimating Fk with space complexity Õ(n1−1/(k−1)). In this paper, we present an algorithm for estimating Fk with space complexity Õ(n1−2/(k+1)), for k > 2, thereby, improving the space complexity compared to the algorithms in [1, 2, 7, 10] for k ≥ 4.
منابع مشابه
Estimating Hybrid Frequency Moments of Data Streams
We consider the problem of estimating hybrid frequency moments of two dimensional data streams. In this model, data is viewed to be organized in a matrix form (Ai,j)1≤i,j,≤n. The entries Ai,j are updated coordinate-wise, in arbitrary order and possibly multiple times. The updates include both increments and decrements to the current value of Ai,j . The hybrid frequency moment Fp,q(A) is defined...
متن کاملEstimating Frequency Moments of Data Streams Using Random Linear Combinations
The problem of estimating the k frequency moment Fk for any nonnegative k, over a data stream by looking at the items exactly once as they arrive, was considered in a seminal paper by Alon, Matias and Szegedy [1, 2]. The space complexity of their algorithm is Õ(n1− 1 k ). For k > 2, their technique does not apply to data streams with arbitrary insertions and deletions. In this paper, we present...
متن کاملBetter Bounds for Frequency Moments in Random-Order Streams
Estimating frequency moments of data streams is a very well studied problem [1–3,9,12] and tight bounds are known on the amount of space that is necessary and sufficient when the stream is adversarially ordered. Recently, motivated by various practical considerations and applications in learning and statistics, there has been growing interest into studying streams that are randomly ordered [3,4...
متن کاملEstimating Entropy of Data Streams Using Compressed Counting
The Shannon entropy is a widely used summary statistic, for example, network traffic measurement, anomaly detection, neural computations, spike trains, etc. This study focuses on estimating Shannon entropy of data streams. It is known that Shannon entropy can be approximated by Rényi entropy or Tsallis entropy, which are both functions of the αth frequency moments and approach Shannon entropy a...
متن کاملOn Estimating Frequency Moments of Data Streams
Space-economical estimation of the pth frequency moments, defined asFp = Pn i=1|fi| , for p > 0, are of interest in estimating all-pairs distances in a large data matrix [14], machine learning, and in data stream computation. Random sketches formed by the inner product of the frequency vector f1, . . . , fn with a suitably chosen random vector were pioneered by Alon, Matias and Szegedy [1], and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004